91 research outputs found
Enhancing Perception and Immersion in Pre-Captured Environments through Learning-Based Eye Height Adaptation
Pre-captured immersive environments using omnidirectional cameras provide a
wide range of virtual reality applications. Previous research has shown that
manipulating the eye height in egocentric virtual environments can
significantly affect distance perception and immersion. However, the influence
of eye height in pre-captured real environments has received less attention due
to the difficulty of altering the perspective after finishing the capture
process. To explore this influence, we first propose a pilot study that
captures real environments with multiple eye heights and asks participants to
judge the egocentric distances and immersion. If a significant influence is
confirmed, an effective image-based approach to adapt pre-captured real-world
environments to the user's eye height would be desirable. Motivated by the
study, we propose a learning-based approach for synthesizing novel views for
omnidirectional images with altered eye heights. This approach employs a
multitask architecture that learns depth and semantic segmentation in two
formats, and generates high-quality depth and semantic segmentation to
facilitate the inpainting stage. With the improved omnidirectional-aware
layered depth image, our approach synthesizes natural and realistic visuals for
eye height adaptation. Quantitative and qualitative evaluation shows favorable
results against state-of-the-art methods, and an extensive user study verifies
improved perception and immersion for pre-captured real-world environments.Comment: 10 pages, 13 figures, 3 tables, submitted to ISMAR 202
3D Reconstruction of Sculptures from Single Images via Unsupervised Domain Adaptation on Implicit Models
Acquiring the virtual equivalent of exhibits, such as sculptures, in virtual
reality (VR) museums, can be labour-intensive and sometimes infeasible. Deep
learning based 3D reconstruction approaches allow us to recover 3D shapes from
2D observations, among which single-view-based approaches can reduce the need
for human intervention and specialised equipment in acquiring 3D sculptures for
VR museums. However, there exist two challenges when attempting to use the
well-researched human reconstruction methods: limited data availability and
domain shift. Considering sculptures are usually related to humans, we propose
our unsupervised 3D domain adaptation method for adapting a single-view 3D
implicit reconstruction model from the source (real-world humans) to the target
(sculptures) domain. We have compared the generated shapes with other methods
and conducted ablation studies as well as a user study to demonstrate the
effectiveness of our adaptation method. We also deploy our results in a VR
application
On the Design Fundamentals of Diffusion Models: A Survey
Diffusion models are generative models, which gradually add and remove noise
to learn the underlying distribution of training data for data generation. The
components of diffusion models have gained significant attention with many
design choices proposed. Existing reviews have primarily focused on
higher-level solutions, thereby covering less on the design fundamentals of
components. This study seeks to address this gap by providing a comprehensive
and coherent review on component-wise design choices in diffusion models.
Specifically, we organize this review according to their three key components,
namely the forward process, the reverse process, and the sampling procedure.
This allows us to provide a fine-grained perspective of diffusion models,
benefiting future studies in the analysis of individual components, the
applicability of design choices, and the implementation of diffusion models
Automatic Dance Generation System Considering Sign Language Information
In recent years, thanks to the development of 3DCG animation editing tools (e.g. MikuMikuDance), a lot of 3D character dance animation movies are created by amateur users. However it is very difficult to create choreography from scratch without any technical knowledge. Shiratori et al. [2006] produced the dance automatic generation system considering rhythm and intensity of dance motions. However each segment is selected randomly from database, so the generated dance motion has no linguistic or emotional meanings. Takano et al. [2010] produced a human motion generation system considering motion labels. However they use simple motion labels like “running” or “jump”, so they cannot generate motions that express emotions. In reality, professional dancers make choreography based on music features or lyrics in music, and express emotion or how they feel in music. In our work, we aim at generating more emotional dance motion easily. Therefore, we use linguistic information in lyrics, and generate dance motion.
In this paper, we propose the system to generate the sign dance motion from continuous sign language motion based on lyrics of music. This system could help the deaf to listen to music as visualized music application
Arbitrary view action recognition via transfer dictionary learning on synthetic training data
Human action recognition is an important problem in robotic vision. Traditional recognition algorithms usually require the knowledge of view angle, which is not always available in robotic applications such as active vision. In this paper, we propose a new framework to recognize actions with arbitrary views. A main feature of our algorithm is that view-invariance is learned from synthetic 2D and 3D training data using transfer dictionary learning. This guarantees the availability of training data, and removes the hassle of obtaining real world video in specific viewing angles. The result of the process is a dictionary that can project real world 2D video into a view-invariant sparse representation. This facilitates the training of a view-invariant classifier. Experimental results on the IXMAS and N-UCLA datasets show significant improvements over existing algorithms
U3DS: Unsupervised 3D Semantic Scene Segmentation
Contemporary point cloud segmentation approaches largely rely on richly
annotated 3D training data. However, it is both time-consuming and challenging
to obtain consistently accurate annotations for such 3D scene data. Moreover,
there is still a lack of investigation into fully unsupervised scene
segmentation for point clouds, especially for holistic 3D scenes. This paper
presents U3DS, as a step towards completely unsupervised point cloud
segmentation for any holistic 3D scenes. To achieve this, U3DS leverages a
generalized unsupervised segmentation method for both object and background
across both indoor and outdoor static 3D point clouds with no requirement for
model pre-training, by leveraging only the inherent information of the point
cloud to achieve full 3D scene segmentation. The initial step of our proposed
approach involves generating superpoints based on the geometric characteristics
of each scene. Subsequently, it undergoes a learning process through a spatial
clustering-based methodology, followed by iterative training using
pseudo-labels generated in accordance with the cluster centroids. Moreover, by
leveraging the invariance and equivariance of the volumetric representations,
we apply the geometric transformation on voxelized features to provide two sets
of descriptors for robust representation learning. Finally, our evaluation
provides state-of-the-art results on the ScanNet and SemanticKITTI, and
competitive results on the S3DIS, benchmark datasets.Comment: 10 Pages, 4 figures, accepted to IEEE/CVF Winter Conference on
Applications of Computer Vision (WACV) 202
- …